AITopics

Country:

Europe (1.00)
Asia > China > Guangdong Province (0.28)
North America > United States > Minnesota (0.28)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Chien-Ju Ho, Rafael Frongillo, Yiling Chen

Eliciting Categorical Data for Optimal Aggregation

Neural Information Processing SystemsFeb-18-2026, 20:37:34 GMT

Neural Information Processing Systems http://nips.cc/

agent, aggregation, interface, (16 more...)

Country:

North America > United States > Colorado (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Neural Information Processing SystemsFeb-8-2026, 08:34:43 GMT

SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL

The left part is about some existing approaches, e.g., IRNet [

computational linguistic, machine learning, natural language, (18 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Asia > China > Hong Kong (0.04)
(14 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Razafindralambo, Raphaël, Sun, Rémy, Precioso, Frédéric, Garreau, Damien, Mattei, Pierre-Alexandre

When Are Two Scores Better Than One? Investigating Ensembles of Diffusion Models

arXiv.org Machine LearningJan-22-2026

Diffusion models now generate high-quality, diverse samples, with an increasing focus on more powerful models. Although ensembling is a well-known way to improve supervised models, its application to unconditional score-based diffusion models remains largely unexplored. In this work we investigate whether it provides tangible benefits for generative modelling. We find that while ensembling the scores generally improves the score-matching loss and model likelihood, it fails to consistently enhance perceptual quality metrics such as FID on image datasets. We confirm this observation across a breadth of aggregation rules using Deep Ensembles, Monte Carlo Dropout, on CIF AR-10 and FFHQ. We attempt to explain this discrepancy by investigating possible explanations, such as the link between score estimation and image quality. We also look into tabular data through random forests, and find that one aggregation strategy outperforms the others. Finally, we provide theoretical insights into the summing of score models, which shed light not only on ensembling but also on several model composition techniques (e.g.

artificial intelligence, diffusion model, machine learning, (15 more...)

arXiv.org Machine Learning

2601.11444

Country: Europe > France (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Alami, Nabil, Zakharia, Jad, Taieb, Souhaib Ben

Symmetric Aggregation of Conformity Scores for Efficient Uncertainty Sets

arXiv.org Machine LearningDec-9-2025

Access to multiple predictive models trained for the same task, whether in regression or classification, is increasingly common in many applications. Aggregating their predictive uncertainties to produce reliable and efficient uncertainty quantification is therefore a critical but still underexplored challenge, especially within the framework of conformal prediction (CP). While CP methods can generate individual prediction sets from each model, combining them into a single, more informative set remains a challenging problem. To address this, we propose SACP (Symmetric Aggregated Con-formal Prediction), a novel method that aggregates nonconformity scores from multiple predictors. SACP transforms these scores into e-values and combines them using any symmetric aggregation function. This flexible design enables a robust, data-driven framework for selecting aggregation strategies that yield sharper prediction sets. We also provide theoretical insights that help justify the validity and performance of the SACP approach. Extensive experiments on diverse datasets show that SACP consistently improves efficiency and often outperforms state-of-the-art model aggregation baselines.

nonconformity score, prediction, predictor, (14 more...)

arXiv.org Machine Learning

2512.06945

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceDec-9-2025

The MICCAI Federated Tumor Segmentation (FeTS) Challenge 2024: Efficient and Robust Aggregation Methods for Federated Learning

Linardos, Akis, Pati, Sarthak, Baid, Ujjwal, Edwards, Brandon, Foley, Patrick, Ta, Kevin, Chung, Verena, Sheller, Micah, Khan, Muhammad Irfan, Jafaritadi, Mojtaba, Kontio, Elina, Khan, Suleiman, Mächler, Leon, Ezhov, Ivan, Shit, Suprosanna, Paetzold, Johannes C., Grimberg, Gustav, Nickel, Manuel A., Naccache, David, Siomos, Vasilis, Passerat-Palmbach, Jonathan, Tarroni, Giacomo, Kim, Daewoon, Klausmann, Leonard L., Shah, Prashant, Menze, Bjoern, Makris, Dimitrios, Bakas, Spyridon

We present the design and results of the MICCAI Federated Tumor Segmentation (FeTS) Challenge 2024, which focuses on federated learning (FL) for glioma sub-region segmentation in multi-parametric MRI and evaluates new weight aggregation methods aimed at improving robustness and efficiency. Six participating teams were evaluated using a standardized FL setup and a multi-institutional dataset derived from the BraTS glioma benchmark, consisting of 1,251 training cases, 219 validation cases, and 570 hidden test cases with segmentations for enhancing tumor (ET), tumor core (TC), and whole tumor (WT). Teams were ranked using a cumulative scoring system that considered both segmentation performance, measured by Dice Similarity Coefficient (DSC) and the 95th percentile Hausdorff Distance (HD95), and communication efficiency assessed through the convergence score. A PID-controller-based method achieved the top overall ranking, obtaining mean DSC values of 0.733, 0.761, and 0.751 for ET, TC, and WT, respectively, with corresponding HD95 values of 33.922 mm, 33.623 mm, and 32.309 mm, while also demonstrating the highest communication efficiency with a convergence score of 0.764. These findings advance the state of federated learning for medical imaging, surpassing top-performing methods from previous challenge iterations and highlighting PID controllers as effective mechanisms for stabilizing and optimizing weight aggregation in FL. The challenge code is available at https://github.com/FeTS-AI/Challenge.

data mining, machine learning, segmentation, (20 more...)

doi: 10.59275/j.melba.2025-5242

2512.06206

Country:

Europe (1.00)
North America > United States > California (0.46)

Genre: Research Report > Experimental Study (0.67)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.34)

Dadkhahi, Hamid, Trabelsi, Firas, Riley, Parker, Juraska, Juraj, Mirzazadeh, Mehdi

Distribution-Calibrated Inference time compute for Thinking LLM-as-a-Judge

arXiv.org Artificial IntelligenceDec-3-2025

Thinking Large Language Models (LLMs) used as judges for pairwise preferences remain noisy at the single-sample level, and common aggregation rules (majority vote, soft self-consistency, or instruction-based self-aggregation) are inconsistent when ties are allowed. We study inference-time compute (ITC) for evaluators that generate n independent thinking-rating samples per item, and propose a principled, distribution-calibrated aggregation scheme. Our method models three-way preferences with a Bradley-Terry-Davidson formulation on rating counts, leveraging both polarity (margin among non-ties) and decisiveness (non-tie rate) to distinguish narrow margins from strong consensus. Across various evaluation benchmarks, our approach consistently reduces MAE and increases pairwise accuracy versus standard baselines, and when evaluated against human-consensus meta-labels, matches or exceeds individual human raters. These results show that carefully allocating ITC and aggregating with distribution-aware methods turns noisy individual model judgments into reliable ratings for evaluation. Thinking large language models (LLMs) are increasingly being employed as automated judges for evaluating the output of other generative systems, a paradigm known as "Thinking-LLM-as-a-Judge" (Saha et al., 2025). This approach offers a scalable and cost-effective alternative to human evaluation, which is often slow and expensive. To mitigate the inherent stochasticity and noise of single-pass judgments, a common strategy is to leverage inference-time compute (ITC) Snell et al. (2024) by generating multiple independent reasoning and rating samples for each item being evaluated. However, the reliability of the final judgment hinges critically on how these multiple outputs are aggregated. Current aggregation methods, such as majority voting (Self-Consistency (Wang et al., 2023b)) or heuristics based on model confidence scores or LLM generated aggregators, are often brittle and statistically suboptimal.

calibration, large language model, machine learning, (21 more...)

2512.03019

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Chien-Ju Ho, Rafael Frongillo, Yiling Chen

Eliciting Categorical Data for Optimal Aggregation

Neural Information Processing SystemsNov-20-2025, 21:34:03 GMT

Models for collecting and aggregating categorical data on crowdsourcing platforms typically fall into two broad categories: those assuming agents honest and consistent but with heterogeneous error rates, and those assuming agents strategic and seek to maximize their expected reward. The former often leads to tractable aggregation of elicited data, while the latter usually focuses on optimal elicitation and does not consider aggregation. In this paper, we develop a Bayesian model, wherein agents have differing quality of information, but also respond to incentives. Our model generalizes both categories and enables the joint exploration of optimal elicitation and aggregation. This model enables our exploration, both analytically and experimentally, of optimal aggregation of categorical data and optimal multiple-choice interface design.

artificial intelligence, bayesian inference, machine learning, (19 more...)

Country:

North America > United States > Colorado (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Collard, Julien, Gentine, Pierre, Zheng, Tian

Power Ensemble Aggregation for Improved Extreme Event AI Prediction

arXiv.org Artificial IntelligenceNov-17-2025

This paper addresses the critical challenge of improving predictions of climate extreme events, specifically heat waves, using machine learning methods. Our work is framed as a classification problem in which we try to predict whether surface air temperature will exceed its q-th local quantile within a specified timeframe. Our key finding is that aggregating ensemble predictions using a power mean significantly enhances the classifier's performance. By making a machine-learning based weather forecasting model generative and applying this non-linear aggregation method, we achieve better accuracy in predicting extreme heat events than with the typical mean prediction from the same model. Our power aggregation method shows promise and adaptability, as its optimal performance varies with the quantile threshold chosen, demonstrating increased effectiveness for higher extremes prediction.

artificial intelligence, machine learning, prediction, (16 more...)

2511.1117

Country: Europe (0.28)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Artificial IntelligenceNov-11-2025

Reasoning Planning for Language Models

Nguyen, Bao, Nguyen, Hieu Trung, She, Ruifeng, Fu, Xiaojin, Nguyen, Viet Anh

Selecting an appropriate reasoning method for a given query remains a key challenge in language model generation. Existing approaches typically generate multiple candidate responses and use an aggregation strategy to select the output answer, often assuming that more candidate answers yield higher accuracy. We revisit this assumption through a rigorous theoretical analysis, deriving accuracy bounds for standard aggregation methods under fixed generation distributions and candidate sizes. Building on these insights, we introduce EPIC, an Ensemble Planning with Contrastive learning framework to learn a shared representation space that captures both model reasoning abilities and query-method compatibility. EPIC incorporates our probability bounds as a regularizer in a utility-driven optimization that balances accuracy and computational cost. Experiments on diverse mathematical reasoning tasks show that EPIC consistently selects optimal reasoning methods, improving accuracy while reducing computational overhead. Our code can be found at https://github.com/nguyenngocbaocmt02/EPIC.

large language model, machine learning, natural language, (16 more...)

2511.00521

Country:

Asia (0.46)
Europe > Austria (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)